Research Question
How left or right are EU directives and regulations introduced between
1989 and 2024?
The directives and regulations used in this research come from two
sources. The first is the CEPS
EurLex dataset, which contains around 70k directives and regulations
from 1989 to 2019. The second source is the EUR-Lex website, an official EU
website providing access to EU law. EUR-Lex can be scraped using the eurlex R
package, which has been done to retrieve post-2019 legislation up to
August 2024 (the time of writing of this report).
EUR-Lex also provides a summary for some directives and regulations (see this example). Summaries are provided on a case-by-case basis, as described here.
Count of Directives and Regulations
##
## Directive Regulation
## 3034 72536
Oldest Legislation
## [1] "1989-01-02"
Newest Legislation
## [1] "2024-08-14"
Directives and Regulations by Year
Number of Summaries
## [1] 1637
Summary Count of Regulations and Directives
##
## Directive Regulation
## 625 1012
Oldest Summary
## [1] "1989-02-13"
Newest Summary
## [1] "2024-02-28"
Summaries of Directives and Regulations by Year
All Directives and Regulations (N = 74,734)
Please note the log-scaled y-axis
Negative values: More left
Positive values: More right
Dotted red line highlights ideological middle (value of
0)
All Summaries (N = 1637)
Please note the log-scaled y-axis
Negative values: More left
Positive values: More right
Dotted red line highlights ideological middle (value of
0)
Scatter Plot with Line of Equality
Red dashed line of equality shows where the values from the preamble
and the summary would be equal. Points below the line indicate that
preamble value is greater than the summary value (i.e., more right), and
points above indicate the opposite. Plot applies jitter in order to
counteract overlapping points (e.g., at 0,0).
Mean-Difference Plot (Bland-Altman Plot)
Red dashed line of equality shows where the values from the preamble
and the summary would be equal. Points below the line indicate that
preamble value is lower than the summary value (i.e., more left), and
points above indicate the opposite. Plot applies jitter in order to
counteract overlapping points.
In general, we can observe that summaries tend to be scored more to the right. We can also observe that summaries often receive a score of 0, while the (longer) preamble contains more data and thus receives a more differentiated score.
Do the results differ when more preprocessing steps are applied? Rheault and Cochrane 2020 perform subsampling, remove digits and words with two letters or less, remove English stop words and overly common words that appear in their corpus.
I do the same on a random subset of 5000 legislations. I use a subset to keep the computation time low. I then compare these with the results without the additional preprocessing steps.
Eyeballing the results, we can see that legislations with additional preprocessing tend to drift towards the right. We can also see clusters, e.g. in the RoBERT_rile plot. There we can see that legislation that previously scored between -4 and +2 gets a score of 0 after preprocessing. This makes sense because the preprocessing steps remove data, making it more difficult for the model to generate a score.
LSS is a method for measuring the semantic similarity between documents and a set of seed words. In our case, each directive or regulation is a document and the seed words convey “typical” left-right terms. The semantic similarity between documents and seed words is calculated using word embeddings provided by GloVe. To keep computation efficient, I use the smallest pre-trained word vectors model available (6B tokens, 400K vocab, 50 dimensions).
In a first approach, I manually come up my own seed words:
Economic left seed words
## [1] "wealth redistribution" "state ownership"
## [3] "public sector jobs" "universal basic income"
## [5] "progressive income tax" "welfare programs"
## [7] "labor unions" "government subsidies"
## [9] "public healthcare" "social housing"
## [11] "minimum wage increase" "unemployment benefits"
## [13] "state intervention" "public education funding"
## [15] "affordable housing initiatives"
Economic right seed words
## [1] "free market capitalism" "privatization"
## [3] "corporate tax cuts" "deregulation"
## [5] "fiscal austerity" "trade liberalization"
## [7] "supply side economics" "property rights"
## [9] "entrepreneurial incentives" "limited government"
## [11] "investment freedom" "business deregulation"
## [13] "lower tax rates" "market driven wages"
## [15] "reduction in public spending"
Social left seed words
## [1] "lgbtq rights" "gender equality"
## [3] "reproductive rights" "anti-discrimination laws"
## [5] "affirmative action" "marriage equality"
## [7] "racial justice" "environmental justice"
## [9] "prison reform" "immigrant integration"
## [11] "workers rights" "secularism"
## [13] "universal human rights" "income equality"
## [15] "cultural diversity" "refugee rights"
## [17] "climate justice" "social inclusion"
## [19] "anti-austerity" "freedom of movement"
## [21] "multiculturalism"
Social right seed words
## [1] "traditional family values" "pro-life"
## [3] "national sovereignty" "law and order"
## [5] "patriotism" "immigration control"
## [7] "anti multiculturalism" "individualism"
## [9] "christian heritage" "border security"
## [11] "conservatism" "cultural identity protection"
## [13] "anti lgbtq adoption" "cultural homogeneity"
## [15] "anti secularism" "tightening asylum policies"
## [17] "pro national identity"
In a second approach, I apply two different models to extract seed words from text. I apply Wordscores to extract seed words from party manifestos. I also use Wordfish to extract seed words from existing legislations.
I apply the Wordscores model using reference scores from the Manifesto Project.
Specifically, I use rile […]. The following tables display
the top and bottom of the tables displaysing extracted Wordscore scores
for terms extracted from the manifestos. Negative values are associated
with left-wing terms, positive values with right-wing terms
RILE: Right-left position of party as given in Michael
Laver/Ian Budge
Most negative values (left-wing terms)
## token
## gender
## equality
## proposes
## token
## die_linke
## sinn_féin
## labor_party
## token
## children_young_people
## greenhouse_gas_emissions
## sexual_orientation_gender
## token
## sexual_orientation_gender_identity
## convention_rights_persons_disabilities
## reduce_greenhouse_gas_emissions
More results: Unigrams | Bigrams | Trigrams | 4-grams
Most positive values (right-wing terms)
## token
## vvd
## sgp
## svp
## token
## christian_union
## progress_party
## vlaams_belang
## token
## free_movement_persons
## vlaams_belang_advocates
## danish_people's_party
## token
## new_nuclear_power_plants
## proliferation_weapons_mass_destruction
## small_medium-sized_enterprises_smes
More results: Unigrams | Bigrams | Trigrams | 4-grams
Planeco: per403 + per404 + per412
Most negative values (left-wing terms)
## token
## sweden
## swedish
## sgp
## token
## progress_party
## canary_islands
## sweden_democrats
## token
## social_democratic_party
## danish_people's_party
## upper_secondary_school
## token
## government_pension_fund_global
## earned_income_tax_credit
## autonomous_community_canary_islands
More results: Unigrams | Bigrams | Trigrams | 4-grams
Most positive values (right-wing terms)
## token
## proposes
## banks
## ecolo
## token
## social_housing
## die_linke
## cdh_proposes
## token
## nuclear_power_plants
## social_security_system
## social_security_contributions
## token
## european_convention_human_rights
## convention_rights_persons_disabilities
## sexual_orientation_gender_identity
More results: Unigrams | Bigrams | Trigrams | 4-grams
Markeco: per401 + per414
Most negative values (left-wing terms)
## token
## canary
## islands
## greens
## token
## canary_islands
## labor_party
## green_party
## token
## greenhouse_gas_emissions
## children_young_people
## green_new_deal
## token
## sexual_orientation_gender_identity
## convention_rights_persons_disabilities
## equal_pay_equal_work
More results: Unigrams | Bigrams | Trigrams | 4-grams
Most positive values (right-wing terms)
## token
## fdp
## vvd
## svp
## token
## conservative_party
## free_democrats
## progress_party
## token
## free_movement_persons
## social_market_economy
## progress_party_work
## token
## small_medium-sized_enterprises_smes
## proliferation_weapons_mass_destruction
## corporate_income_tax_rate
More results: Unigrams | Bigrams | Trigrams | 4-grams
Welfare: per503 + per504
Most negative values (left-wing terms)
## token
## svp
## vvd
## fdp
## token
## vlaams_belang
## party_animals
## christian_union
## token
## social_market_economy
## small_medium-sized_enterprises
## free_movement_persons
## token
## akel_left_new_forces
## take_advantage_opportunities_offered
## corporate_income_tax_rate
More results: Unigrams | Bigrams | Trigrams | 4-grams
Most positive values (right-wing terms)
## token
## sinn
## féin
## denk
## token
## die_linke
## cdh_proposes
## sinn_féin
## token
## mental_health_services
## children_young_people
## sinn_féin_priorities
## token
## equal_pay_equal_work
## convention_rights_persons_disabilities
## sexual_orientation_gender_identity
Wordfish is an unsupervised method, meaning that it estimates the positions of documents solely based on the observed word frequencies. Due to computational restraints, I could only process a random sample of 5k and 10k legislations while running the Wordfish model. I believe that including more or even all 70k legislations would not render much different results or would not justify the longer compute time.
The tables below display features (i.e., tokens) and their respective beta (i.e., the estimated effect of the token on the latent dimension). A positive beta value indicates that the word is more associated with the positive side of the dimension, while a negative beta value indicates association with the negative side. As the tables display below, the results are difficult to interpret. Tokens on both sides of the latent dimension (i.e., highest and lowest beta values) cannot be assigned to a political dimension.
Wordfish Scores: Random Subsample of 5000 Legislations (Unigrams, highest and lowest beta values)
## feature beta
## 1 efsa 43.75096
## 2 incl 41.48821
## 3 imo 41.14718
## 4 circ 39.79716
## 5 itu-r 37.67625
## 6 consol 35.14699
## 7 butyrate 32.35146
## 8 methylthio 29.25866
## 9 formate 29.04137
## 10 coe 28.60147
## feature beta
## 1 kolejowej -1.441089
## 2 ortslagen -1.434837
## 3 stary -1.431810
## 4 groß -1.428269
## 5 południowy -1.425471
## 6 drogi -1.422172
## 7 sofern -1.420133
## 8 drogę -1.418893
## 9 następnie -1.417684
## 10 wyznaczonej -1.416342
Wordfish Scores: Random Subsample of 5000 Legislations (Bigrams, highest and lowest beta values)
## feature beta
## 1 contained_housing 51.48734
## 2 free_free 46.04209
## 3 monolithic_integrated 45.14960
## 4 form_monolithic 44.09294
## 5 letters_identification 30.71391
## 6 ecu_net 22.21416
## 7 ecus_ecu 22.12461
## 8 ecu_cus 22.00006
## 9 code_codice 21.49203
## 10 using_tail 19.83098
## feature beta
## 1 legally_binding -0.4740856905
## 2 international_agreements -0.4612420543
## 3 national_food -0.0140582032
## 4 international_obligations -0.0009404775
## 5 rice_sector 0.0008225689
## 6 landing_obligation 0.0016989285
## 7 non-personal_data 0.0017392246
## 8 data_holders 0.0017393568
## 9 data_altruism 0.0017417777
## 10 recognised_data 0.0018706427
Wordfish Scores: Random Subsample of 10,000 Legislations (Bigrams, highest and lowest beta values)
## feature beta
## 1 width_height 0.25166199
## 2 currently_force 0.21852978
## 3 free_text 0.10477695
## 4 special_edition 0.02915246
## 5 responsible_agency 0.01467287
## 6 list_responsible 0.01467231
## 7 qualifier_n.a 0.01467154
## 8 agency_n.a 0.01467150
## 9 number_n.a 0.01467080
## 10 related_place 0.01467034
## feature beta
## 1 non-defaulted_applicable -210.20974
## 2 deliveries_non-defaulted -184.90778
## 3 protection_applicable -158.09237
## 4 exposures_without -157.40835
## 5 applicable_mortgages -130.05261
## 6 mln_eur -121.98412
## 7 corporates_sme -121.98412
## 8 część_gminy -121.43343
## 9 corporates_credit -91.31062
## 10 without_privilege -84.66717
Both methods do not return satisfying results. Some terms returned by Wordscores faintly resemble the manual seed words. On the other hand, almost all terms returned by Wordfish cannot be assigned to a clear political side or topic. The Wordscore and Wordfish seed words will not be applied in the LSS method for the time being.
How well can LSS measure the left-right polarity of EU policies? To answer this question, I analyse the keywords that are attached to each document. The EU gives each policy a set of keywords that describe the policy’s content. If LSS works correctly, then the keywords should be associated with economic left/right terms. There are two sets of keywords, as described in the dataset’s codebook:
LSS calculates a polarity score for each document. Values range between ca. -2.5 and +2. A negative score is associated with right terms, a positive score with left terms. I create three bins of equal width, containing an unequal number of observations and label them “left”, “centre” and “right”.
Number of documents per bin
Economy
centre left right
68019 6277 438
Social
centre left right
26580 47133 1021
Most frequent keywords in economic “left” bin
EUROVOC_Keyword Occurences
1 PDO 502
2 product description 467
3 blockade 452
4 ban on sales 433
5 economic sanctions 425
Subject_matter_Keyword Occurences
1 marketing 1515
2 agricultural policy 1029
3 health 1021
4 agricultural activity 804
5 consumption 790
Most frequent keywords in social “left” bin
EUROVOC_Keyword Occurences
1 Community aid to exports 5934
2 entry price 5263
3 aubergine 5243
4 citron 5190
5 apple 4686
Subject_matter_Keyword Occurences
1 plant product 13369
2 trade policy 10386
3 prices 7992
4 trade 7936
5 tariff policy 7351
Most frequent keywords in economic “centre” bin
EUROVOC_Keyword Occurences
1 Community aid to exports 7570
2 automatic public tendering 5558
3 entry price 5514
4 aubergine 5463
5 citron 5371
Subject_matter_Keyword Occurences
1 plant product 17042
2 trade policy 14421
3 trade 10926
4 tariff policy 10286
5 prices 9357
Most frequent keywords in social “centre” bin
EUROVOC_Keyword Occurences
1 Community aid to exports 1649
2 automatic public tendering 1472
3 import 1453
4 sea fish 1434
5 catch plan 1313
Subject_matter_Keyword Occurences
1 trade policy 4444
2 plant product 3961
3 Europe 3905
4 trade 3204
5 tariff policy 3088
Most frequent keywords in economic “right” bin
EUROVOC_Keyword Occurences
1 beef 80
2 automatic public tendering 59
3 floor price 48
4 EC country 44
5 CCT duties 41
Subject_matter_Keyword Occurences
1 trade policy 145
2 animal product 84
3 prices 80
4 Europe 75
5 trade 67
Most frequent keywords in social “right” bin
EUROVOC_Keyword Occurences
1 agri-foodstuffs product 99
2 food product safety 97
3 fungicide 85
4 blockade 71
5 agricultural product 69
Subject_matter_Keyword Occurences
1 marketing 165
2 tariff policy 158
3 agricultural activity 146
4 trade policy 125
5 foodstuff 123
As a comparison, here are the most frequent keywords overall:
EUROVOC_Keyword Occurences
1 Community aid to exports 7590
2 automatic public tendering 5633
3 entry price 5518
4 aubergine 5481
5 citron 5387
Subject_matter_Keyword Occurences
1 plant product 17447
2 trade policy 14955
3 trade 11202
4 tariff policy 10597
5 prices 9460
A first analysis of the results shows that the keywords partially align with typical economic left/right terms. Keywords like “equal opportunity”, “State pension”, “employment”, “labour market” and “social protection” fit well with an economic left ideology, while “trade policy”, “tariff policy” and “international trade” can be more strongly associated with an economic right ideology.
However, there are keywords that appear to have no ideological connotation but still appear in the “most frequent” list, e.g., “exchange of information”, “beef” or “Europe”. This indicates that the polarity score calculated by LSS has its flaws.
Eyeballing the results, I feel that the Subject Matter keywords are more suitable for the evaluation than the EUROVOC keywords. This may be due to the Subject Matter keywords being more general than EUROVOC and thus capturing broader meanings. The results show that the Subject Matter keywords align better with my expectations, likely because their general nature allows them to capture a wider range of topics and themes within the dataset, while the EUROVOC keywords are too specific to do this.
Overview LSS Scores
LSS Economy vs LSS Social Scores
LSS vs Hix Høyland (normalised and reversed
scores)
LSS scores are reversed in these plots: Negative values are
ideologically left, positive values are ideologically right.